Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Fizscy Date: Dec 28, 2011 7:42am
Forum: web Subject: Re: Why does the wayback machine pay attention to robots.txt

Yes I have read it. I'm a long-term contributor at wikipedia and I deal with Canadian and American copyright law on a regular basis because of that.

"The purpose and nature of the use.

If the copy is used for teaching at a non-profit institution, distributed without charge, and made by a teacher or students acting individually, then the copy is more likely to be considered as fair use."

The web archive is not a search engine crawler or similar robot, yet it seems to follow disallow requests for search crawlers just the same.

Second, the adding of that robots.txt has absolutely ZERO effect on the copyright and the fair use of the site. Nothing, nada, zip, zilch.

Third, domains change hands. Using a robots.txt file today to erase all previous copies on the archive is rediculous, especially since the copies may be of a different site.

The archive should only exempt sites that have specifically requested, to archive.org by email, that their website not be indexed.

Reply to this post
Reply [edit]

Poster: jory2 Date: Dec 28, 2011 8:39am
Forum: web Subject: Re: Why does the wayback machine pay attention to robots.txt

"Yes I have read it. I'm a long-term contributor at wikipedia and I deal with Canadian and American copyright law on a regular basis because of that."

Good, then I'll assume you're aware that "fair-use" is restricted to the U.S. Copyright Act and not the Canadian Copyright Act. And for what it's worth my field of study is Copyright and Intellectual Property Law.

"The purpose and nature of the use."

I'll assume you understood that to mean that not all Works can be argued under "fair-use"? Unless you applied your own special meaning to the fair-use clause of the U.S. Copyright Act(s)?

"If the copy is used for teaching at a non-profit institution, distributed without charge, and made by a teacher or students acting individually, then the copy is more likely to be considered as fair use."

I'll assume you're aware this website is privately owned and operated and receives private funds on top of government funds, and of course has the archive-it paid service. This website is considered a non-profit commercial website. It is not legally considered a Library and because of that will not be able to apply the limitations for Libraries as detailed in both the U.S and Canadian Copyright Acts.

"The web archive is not a search engine crawler or similar robot, yet it seems to follow disallow requests for search crawlers just the same."

What's your point?

"Second, the adding of that robots.txt has absolutely ZERO effect on the copyright and the fair use of the site. Nothing, nada, zip, zilch."

I'll assume you understood that content owners are not legally obligated to put a robot.txt file on their sites to prevent copyright violations.
Unless you have your own special meaning to that as well?

"Third, domains change hands. Using a robots.txt file today to erase all previous copies on the archive is rediculous, especially since the copies may be of a different site."

This website is not simply coping the name of the domain, this website is making copies of the intellectual properties on privately owned websites without the express permission of the rightful copyright owners.

"The archive should only exempt sites that have specifically requested, to archive.org by email, that their website not be indexed."

This website should only be making copies of websites that they received permission copy in the first place.

Reply to this post
Reply [edit]

Poster: Thestral Date: Apr 25, 2014 1:12pm
Forum: web Subject: Re: Why does the wayback machine pay attention to robots.txt

@jory2

You wrote...

"This website should only be making copies of websites that they received permission copy in the first place."

You do realize that making copies is literally the way these here internets work don't you? You cannot view a website without making a copy of it, be it a permanent copy (as here) or a temporary one (as in your browser cache and temp files). If your notion were to be made reality there'd be no internet to archive as no one could "copy" web pages to view them.

Aside from that you neatly avoided the main issue. The robots.txt policy here makes it possible for people with no rights over archives of certain intellectually property to literally wipe the last vestiges of said IP from the face of the web (just because they happen to have acquired a domain name that once belonged to the rightful IP holder).

That is a huge issue that needs a resolution which restores and protects these wrongfully removed archives while still allowing sites to nondestructively exempt themselves from archival going forward.



This post was modified by Thestral on 2014-04-25 20:12:02

Reply to this post
Reply [edit]

Poster: Mr Cranky Date: Dec 28, 2011 11:28am
Forum: web Subject: Re: Why does the wayback machine pay attention to robots.txt

What is your take on the Authors Guild stance about robots.txt?

Reply to this post
Reply [edit]

Poster: jory2 Date: Dec 28, 2011 12:13pm
Forum: web Subject: Re: Why does the wayback machine pay attention to robots.txt

I'm not familiar with the Authors Guild's stance or personal opinions on robot.txt files.
I am curious though, did you find it to be a interestingly humorous read like the misunderstandings that play-out in the forums on this website with respect to copyrights fair use and the legal definitions of libraries?

Reply to this post
Reply [edit]

Poster: jory2 Date: Dec 30, 2011 8:58am
Forum: web Subject: Re: Why does the wayback machine pay attention to robots.txt

I have been looking for the Authors Guild stance on robot.txt files, I haven't found much on it though.

I did come across the Internet Archive's stance on robot.txt files however.

Starting January 2010, Archive-It is running a pilot program to test a new feature that allows our partners to crawl and archive areas of sites that are blocked by a site's robots.txt file.

"Partners who have a need for this feature should contact the Archive-It team to let us know what sites and why you would like to use this feature. It would be helpful to know if you have previously contacted the site owner about allowing our crawler to crawl their sites, and what their response was (if any). We ask our partners to use this feature only when necessary. Also, please keep in mind that many things that are blocked by robots.txt are parts of a site that you wouldn't necessary want to archive, so please be sure to review the urls that are blocked in the 'Hosts Report' for your crawl to determine if you need this feature or not."

Oddly enough this stance seems to be a complete 180 on this websites TOS.


Terms of Use (10 Mar 2001)